Effective & Efficient Document Ranking without using a Large Lexicon
نویسنده
چکیده
Although a word-based method is commonly used in document retrieval, it cannot be directly applicable to languages that have no obvious word separator. Given a lexicon, it is possible to identify words in documents, but a large lexicon is troublesome to maintain and makes retrieval systems large and complicated. This paper proposes an effective and efficient ranking that does not use a large lexicon; words need not be identified during document registration because a character-based signature file is used for the access structure. A user request, during document retrieval, is statistically analyzed to generate an appropriate query, and the query is evaluated efficiently in a wordbased manner using the character-based index. We also propose two optimizing techniques to accelerate retrieval.
منابع مشابه
Automatic Construction of an Opinion-Term Vocabulary for Ad Hoc Retrieval
We present a method to automatically generate a term-opinion lexicon. We also weight these lexicon terms and use them at real time to boost the ranking with opinionated-content documents. We define very simple models both for opinion-term extraction and document ranking. Both the lexicon model and retrieval model are assessed. To evaluate the quality of the lexicon we compare performance with a...
متن کاملRRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features
Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...
متن کاملRanking efficient DMUs using the variation coefficient of weights in DEA
One of the difficulties of Data Envelopment Analysis(DEA) is the problem of deciency discriminationamong efficient Decision Making Units(DMUs) and hence, yielding large number of DMUs as efficientones. The main purpose of this paper is to overcome this inability. One of the methods for rankingefficient DMUs is minimizing the Coefficient of Variation (CV) for inputs-outputs weights. In this pape...
متن کاملCross Language Text Categorization Using a Bilingual Lexicon
With the popularity of the Internet at a phenomenal rate, an ever-increasing number of documents in languages other than English are available in the Internet. Cross language text categorization has attracted more and more attention for the organization of these heterogeneous document collections. In this paper, we focus on how to conduct effective cross language text categorization. To this en...
متن کاملLearned Lexicon-Driven Interactive Video Retrieval
We combine in this paper automatic learning of a large lexicon of semantic concepts with traditional video retrieval methods into a novel approach to narrow the semantic gap. The core of the proposed solution is formed by the automatic detection of an unprecedented lexicon of 101 concepts. From there, we explore the combination of query-by-concept, query-by-example, query-bykeyword, and user in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996